Quasi Newton Temporal Difference Learning

نویسندگان

  • Arash Givchi
  • Maziar Palhang
چکیده

Fast convergent and computationally inexpensive policy evaluation is an essential part of reinforcement learning algorithms based on policy iteration. Algorithms such as LSTD, LSPE, FPKF and NTD, have faster convergence rates but they are computationally slow. On the other hand, there are algorithms that are computationally fast but with slower convergence rate, among them are TD, RG, GTD2 and TDC. This paper presents a regularized Quasi Newton Temporal Difference learning algorithm which uses second-order information while maintaining a fast convergence rate. In simple language, we combine the idea of TD learning with quasi Newton algorithm SGD-QN. We explore the development of QNTD algorithm and discuss its convergence properties. We support our ideas with empirical results on four standard benchmarks in reinforcement learning literature with two small problems, Random Walk and Boyan chain and two bigger problems, cart-pole and linked-pole balancing. Empirical studies show that QNTD speeds up convergence and provides better accuracy in comparison to the conventional TD.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective sketching methods for value function approximation

High-dimensional representations, such as radial basis function networks or tile coding, are common choices for policy evaluation in reinforcement learning. Learning with such high-dimensional representations, however, can be expensive, particularly for matrix methods, such as least-squares temporal difference learning or quasi-Newton methods that approximate matrix step-sizes. In this work, we...

متن کامل

Iterative learning control based on quasi-Newton methods

In this paper we propose an iterative learning control scheme based on the quasi-Newton method. The iterative learning control is designed to improve the performance of the systems working cyclically. We consider the general type of systems described by continuously diierentiable operator acting in Banach spaces. The suucient conditions for the convergence of quasi-Newton iterative learning alg...

متن کامل

On the convergence speed of artificial neural networks in‎ ‎the solving of linear ‎systems

‎Artificial neural networks have the advantages such as learning, ‎adaptation‎, ‎fault-tolerance‎, ‎parallelism and generalization‎. ‎This ‎paper is a scrutiny on the application of diverse learning methods‎ ‎in speed of convergence in neural networks‎. ‎For this aim‎, ‎first we ‎introduce a perceptron method based on artificial neural networks‎ ‎which has been applied for solving a non-singula...

متن کامل

Quasi-Newton Methods: A New Direction

Four decades after their invention, quasiNewton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression u...

متن کامل

Learning geometric combinations of Gaussian kernels with alternating Quasi-Newton algorithm

We propose a novel algorithm for learning a geometric combination of Gaussian kernel jointly with a SVM classifier. This problem is the product counterpart of MKL, with restriction to Gaussian kernels. Our algorithm finds a local solution by alternating a Quasi-Newton gradient descent over the kernels and a classical SVM solver over the instances. We show promising results on well known data se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014